{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "whjsJasuhstV"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_05_1_langchain_data.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "euOZxlIMhstX"
   },
   "source": [
    "# T81-559: Applications of Generative Artificial Intelligence\n",
    "**Module 5: LangChain: Data Extraction**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d4Yov72PhstY"
   },
   "source": [
    "# Module 5 Material\n",
    "\n",
    "* **Part 5.1: Structured Output Parser** [[Video]](https://www.youtube.com/watch?v=62CSR141VRE) [[Notebook]](t81_559_class_05_1_langchain_data.ipynb)\n",
    "* Part 5.2: Other Parsers (CSV, JSON, Pandas, Datetime) [[Video]](https://www.youtube.com/watch?v=VXm8gPzU3qc) [[Notebook]](t81_559_class_05_2_parsers.ipynb)\n",
    "* Part 5.3: Pydantic parser [[Video]](https://www.youtube.com/watch?v=dc4fn-W60hg) [[Notebook]](t81_559_class_05_3_pydantic.ipynb)\n",
    "* Part 5.4: Custom Output Parser [[Video]](https://www.youtube.com/watch?v=jBpkAblQC_U) [[Notebook]](t81_559_class_05_4_custom_parsers.ipynb)\n",
    "* Part 5.5: Output-Fixing Parser [[Video]](https://www.youtube.com/watch?v=_txWiLjf4bo) [[Notebook]](t81_559_class_05_5_output_fixing_parsers.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AcAUP0c3hstY"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running and maps Google Drive if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xsI496h5hstZ",
    "outputId": "9b2ec89f-0a9b-40e2-98ee-4c88c3a11608"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: using Google CoLab\n",
      "Collecting langchain\n",
      "  Downloading langchain-0.1.17-py3-none-any.whl (867 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m867.6/867.6 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting langchain_openai\n",
      "  Downloading langchain_openai-0.1.5-py3-none-any.whl (34 kB)\n",
      "Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (6.0.1)\n",
      "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.29)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (3.9.5)\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (4.0.3)\n",
      "Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)\n",
      "  Downloading dataclasses_json-0.6.5-py3-none-any.whl (28 kB)\n",
      "Collecting jsonpatch<2.0,>=1.33 (from langchain)\n",
      "  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)\n",
      "Collecting langchain-community<0.1,>=0.0.36 (from langchain)\n",
      "  Downloading langchain_community-0.0.36-py3-none-any.whl (2.0 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m25.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting langchain-core<0.2.0,>=0.1.48 (from langchain)\n",
      "  Downloading langchain_core-0.1.50-py3-none-any.whl (302 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m302.8/302.8 kB\u001b[0m \u001b[31m28.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting langchain-text-splitters<0.1,>=0.0.1 (from langchain)\n",
      "  Downloading langchain_text_splitters-0.0.1-py3-none-any.whl (21 kB)\n",
      "Collecting langsmith<0.2.0,>=0.1.17 (from langchain)\n",
      "  Downloading langsmith-0.1.53-py3-none-any.whl (116 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.4/116.4 kB\u001b[0m \u001b[31m13.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.25.2)\n",
      "Requirement already satisfied: pydantic<3,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.7.1)\n",
      "Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.31.0)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (8.2.3)\n",
      "Collecting openai<2.0.0,>=1.10.0 (from langchain_openai)\n",
      "  Downloading openai-1.25.1-py3-none-any.whl (312 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m312.9/312.9 kB\u001b[0m \u001b[31m27.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting tiktoken<1,>=0.5.2 (from langchain_openai)\n",
      "  Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m57.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.4)\n",
      "Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain)\n",
      "  Downloading marshmallow-3.21.2-py3-none-any.whl (49 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.3/49.3 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain)\n",
      "  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n",
      "Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain)\n",
      "  Downloading jsonpointer-2.4-py2.py3-none-any.whl (7.8 kB)\n",
      "Collecting packaging<24.0,>=23.2 (from langchain-core<0.2.0,>=0.1.48->langchain)\n",
      "  Downloading packaging-23.2-py3-none-any.whl (53 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.0/53.0 kB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.17->langchain)\n",
      "  Downloading orjson-3.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m142.7/142.7 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (3.7.1)\n",
      "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (1.7.0)\n",
      "Collecting httpx<1,>=0.23.0 (from openai<2.0.0,>=1.10.0->langchain_openai)\n",
      "  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m8.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (1.3.1)\n",
      "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (4.66.2)\n",
      "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (4.11.0)\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (0.6.0)\n",
      "Requirement already satisfied: pydantic-core==2.18.2 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (2.18.2)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.3.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.7)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2.0.7)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2024.2.2)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (3.0.3)\n",
      "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken<1,>=0.5.2->langchain_openai) (2023.12.25)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai<2.0.0,>=1.10.0->langchain_openai) (1.2.1)\n",
      "Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai<2.0.0,>=1.10.0->langchain_openai)\n",
      "  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m8.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai<2.0.0,>=1.10.0->langchain_openai)\n",
      "  Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain)\n",
      "  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
      "Installing collected packages: packaging, orjson, mypy-extensions, jsonpointer, h11, typing-inspect, tiktoken, marshmallow, jsonpatch, httpcore, langsmith, httpx, dataclasses-json, openai, langchain-core, langchain-text-splitters, langchain_openai, langchain-community, langchain\n",
      "  Attempting uninstall: packaging\n",
      "    Found existing installation: packaging 24.0\n",
      "    Uninstalling packaging-24.0:\n",
      "      Successfully uninstalled packaging-24.0\n",
      "Successfully installed dataclasses-json-0.6.5 h11-0.14.0 httpcore-1.0.5 httpx-0.27.0 jsonpatch-1.33 jsonpointer-2.4 langchain-0.1.17 langchain-community-0.0.36 langchain-core-0.1.50 langchain-text-splitters-0.0.1 langchain_openai-0.1.5 langsmith-0.1.53 marshmallow-3.21.2 mypy-extensions-1.0.0 openai-1.25.1 orjson-3.10.2 packaging-23.2 tiktoken-0.6.0 typing-inspect-0.9.0\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "try:\n",
    "    from google.colab import drive, userdata\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "# OpenAI Secrets\n",
    "if COLAB:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
    "\n",
    "# Install needed libraries in CoLab\n",
    "if COLAB:\n",
    "    !pip install langchain langchain_openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pC9A-LaYhsta"
   },
   "source": [
    "# 5.1: Structured Output Parser\n",
    "\n",
    "LangChain offers a diverse array of output parsers designed to efficiently extract and structure information from the outputs returned by large language models (LLMs). These output parsers play a crucial role in LangChain's architecture, typically forming integral components of what are known as LangChain chains. These chains are configurable sequences of operations that process and interact with model outputs to perform complex tasks. To facilitate the creation and management of these chains, LangChain introduces the LangChain Expression Language (LCEL).\n",
    "\n",
    "Next, we will delve into how LCEL enables the advanced creation and execution of LangChain chains, and subsequently explore how different output parsers can be utilized within these chains to achieve precise and actionable insights from LLM outputs.\n",
    "\n",
    "## LangChain Expression Language, or LCEL\n",
    "\n",
    "LangChain Expression Language, or [LCEL](https://python.langchain.com/docs/expression_language/), provides a declarative way to easily compose chains. From its inception, LCEL has supported transitioning prototypes to production without any code changes. This includes everything from simple \"prompt + LLM\" chains to highly complex chains that some users have successfully implemented with hundreds of steps in production environments. Here are several reasons why you might choose LCEL:\n",
    "\n",
    "* [First-class streaming support](https://python.langchain.com/docs/expression_language/streaming/): By building your chains with LCEL, you achieve the optimal time-to-first-token, which is the time elapsed until the first output chunk appears. In some instances, this means streaming tokens directly from an LLM to a streaming output parser, delivering parsed, incremental chunks of output at the same rate the LLM provider releases the raw tokens.\n",
    "\n",
    "* [Async support](https://python.langchain.com/docs/expression_language/interface/): Chains built with LCEL can use both synchronous APIs (for example, in your Jupyter notebook during prototyping) and asynchronous APIs (for example, in a LangServe server). This dual capability allows the same code to perform efficiently in both prototypes and production, handling many concurrent requests on the same server.\n",
    "\n",
    "* [Optimized parallel execution](https://python.langchain.com/docs/expression_language/primitives/parallel/): LCEL automatically executes parallelizable steps in your chains simultaneously, whether you are using synchronous or asynchronous interfaces, minimizing latency.\n",
    "\n",
    "* [Retries and fallbacks](https://python.langchain.com/docs/guides/productionization/fallbacks/): You can configure retries and fallbacks for any part of your LCEL chain, enhancing reliability at scale. We are also enhancing streaming support for retries and fallbacks to improve reliability without increasing latency.\n",
    "\n",
    "* [Access to intermediate results](https://python.langchain.com/docs/expression_language/interface/#async-stream-events-beta): For more complex chains, accessing intermediate results can be crucial. This feature allows end-users to see progress or helps developers debug the chain. Intermediate results are streamable and available on every LangServe server.\n",
    "\n",
    "* [Input and output schemas](https://python.langchain.com/docs/expression_language/interface/#input-schema): Every LCEL chain comes with Pydantic and JSONSchema schemas inferred from your chain's structure. These schemas are essential for validating inputs and outputs and are a core feature of LangServe.\n",
    "\n",
    "* [Seamless LangSmith tracing](https://python.langchain.com/docs/langsmith/): As chains grow in complexity, it's vital to trace each step. LCEL automatically logs all steps to LangSmith for enhanced observability and debuggability.\n",
    "\n",
    "* [Seamless LangServe deployment](https://python.langchain.com/docs/langserve/): Deploying a chain created with LCEL is straightforward using LangServe, facilitating seamless transitions from development to production."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "-Vds7BAWITTH"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HMLrKTqgSRkM"
   },
   "source": [
    "# Introduction to the StructuredOutputParser\n",
    "\n",
    "LangChain offers a variety of tools designed to help process and extract information from the output of large language models (LLMs). In this section, we will explore the capabilities of the Structured Output Parser, which is particularly useful when you need to return information across multiple fields. This parser is adept at organizing and segmenting model output into distinct categories, making the data more manageable and interpretable. Although the Pydantic/JSON parser provides a more robust and feature-rich option for handling complex data structures, the Structured Output Parser is an excellent choice for environments where computational resources are limited, or when using less powerful models. It offers a straightforward and efficient way to structure output without overwhelming the model or the system.\n",
    "\n",
    "We begin by creating a ResponseSchema for each value we wish to extract. We must describe each value. We construct the StructuredOutputParser from a list of these schemas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "Uu5zOki19kKR"
   },
   "outputs": [],
   "source": [
    "from langchain.output_parsers import ResponseSchema, StructuredOutputParser\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "response_schemas = [\n",
    "    ResponseSchema(name=\"answer\", description=\"answer to the user's question\"),\n",
    "    ResponseSchema(\n",
    "        name=\"source\",\n",
    "        description=\"source used to answer the user's question, should be a website.\",\n",
    "    ),\n",
    "]\n",
    "output_parser = StructuredOutputParser.from_response_schemas(response_schemas)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_dQTrDn8TuUk"
   },
   "source": [
    "We now construct a prompt template that allows both the question and formatting instructions to be specified. The schemas that we just created will generate the formatting instructions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "mXwzsoJU9kAY"
   },
   "outputs": [],
   "source": [
    "format_instructions = output_parser.get_format_instructions()\n",
    "prompt = PromptTemplate(\n",
    "    template=\"answer the users question as best as possible.\\n{format_instructions}\\n{question}\",\n",
    "    input_variables=[\"question\"],\n",
    "    partial_variables={\"format_instructions\": format_instructions},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_DDMfoBUUDjs"
   },
   "source": [
    "Before we query the LLM, let's provide a question and see how LangChang constructs the prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "IX9IhDJpBDNN",
    "outputId": "d2f9fcb1-a3a6-4f05-f26a-72080ac7e355"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "answer the users question as best as possible.\n",
      "The output should be a markdown code snippet formatted in the following schema, including the leading and trailing \"```json\" and \"```\":\n",
      "\n",
      "```json\n",
      "{\n",
      "\t\"answer\": string  // answer to the user's question\n",
      "\t\"source\": string  // source used to answer the user's question, should be a website.\n",
      "}\n",
      "```\n",
      "When was the Python programming language introduced?\n"
     ]
    }
   ],
   "source": [
    "question = \"When was the Python programming language introduced?\"\n",
    "\n",
    "print(prompt.invoke(question).text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sCC1bV0dUZgt"
   },
   "source": [
    "As you can see, the schemas LangChain uses the schemas to specify a JSON format to return the answer. With the answer in this JSON format, it is straightforward to parse the individual values.\n",
    "\n",
    "We now construct a chain to use the StructuredOutputParser and query the LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "CWjcI5zB9j9k"
   },
   "outputs": [],
   "source": [
    "MODEL = \"gpt-4o-mini\"\n",
    "TEMPERATURE = 0\n",
    "\n",
    "# Initialize the OpenAI LLM with your API key\n",
    "llm = ChatOpenAI(\n",
    "    model=MODEL,\n",
    "    temperature=TEMPERATURE,\n",
    "    n=1\n",
    ")\n",
    "chain = prompt | llm | output_parser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gux36efBgSR9"
   },
   "source": [
    "Now, we present a question, an answer, and a source for the question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "SxxA2nKN9j7B",
    "outputId": "f035418e-3968-45a7-f1a4-d79d62a8f954"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'answer': 'Python programming language was introduced in 1991.', 'source': 'https://en.wikipedia.org/wiki/Python_(programming_language)'}\n"
     ]
    }
   ],
   "source": [
    "result = chain.invoke({\"question\": question})\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wR5J9U9bg5Br"
   },
   "source": [
    "## Detect and Translate\n",
    "\n",
    "We will now try an example with more values. The following program accepts text in any language and will translate it into French, Spanish, and Chinese."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "9snGgL4riemY"
   },
   "outputs": [],
   "source": [
    "response_schemas = [\n",
    "    ResponseSchema(name=\"detected\", description=\"The language of the user's input\"),\n",
    "    ResponseSchema(name=\"spanish\", description=\"Spanish translation\"),\n",
    "    ResponseSchema(name=\"french\", description=\"French translation\"),\n",
    "    ResponseSchema(name=\"chinese\", description=\"Chinese translation\"),\n",
    "]\n",
    "output_parser = StructuredOutputParser.from_response_schemas(response_schemas)\n",
    "\n",
    "format_instructions = output_parser.get_format_instructions()\n",
    "prompt = PromptTemplate(\n",
    "    template=\"translate into the requested languages.\\n{format_instructions}\\n{question}\",\n",
    "    input_variables=[\"question\"],\n",
    "    partial_variables={\"format_instructions\": format_instructions},\n",
    ")\n",
    "\n",
    "chain = prompt | llm | output_parser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sa6_dAk2ugre"
   },
   "source": [
    "We begin by trying an English sentence and observe it translated into the other three languages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "c9wKUJR3tiYP",
    "outputId": "2e7a2aa4-8758-4135-d2e3-d834ccfcf44a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'detected': 'en', 'spanish': '¿Cuándo se introdujo el lenguaje de programación Python?', 'french': 'Quand le langage de programmation Python a-t-il été introduit?', 'chinese': 'Python编程语言是什么时候推出的？'}\n"
     ]
    }
   ],
   "source": [
    "question = \"When was the Python programming language introduced?\"\n",
    "result = chain.invoke({\"question\": question})\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "w7QKN6j8vNGF"
   },
   "source": [
    "We also observe that it can detect and translate Swahili."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "pDtvdS6htpMg",
    "outputId": "80957b38-8ad2-4586-bcc9-6f8045462ca7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'detected': 'Swahili', 'spanish': '¿Qué está pasando?', 'french': \"Qu'est-ce qui se passe?\", 'chinese': '我不知道发生了什么？'}\n"
     ]
    }
   ],
   "source": [
    "question = \"Sijui nini kinaendelea?\"\n",
    "result = chain.invoke({\"question\": question})\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3TjZs_TRht1n"
   },
   "source": [
    "# Module 5 Assignment\n",
    "\n",
    "You can find the first assignment here: [assignment 5](https://github.com/jeffheaton/app_generative_ai/blob/main/assignments/assignment_yourname_class5.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.11 (genai)",
   "language": "python",
   "name": "genai"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
